Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 57
Filtrar
2.
J Am Soc Mass Spectrom ; 35(3): 542-550, 2024 Mar 06.
Artículo en Inglés | MEDLINE | ID: mdl-38310603

RESUMEN

Automation is dramatically changing the nature of laboratory life science. Robotic lab hardware that can perform manual operations with greater speed, endurance, and reproducibility opens an avenue for faster scientific discovery with less time spent on laborious repetitive tasks. A major bottleneck remains in integrating cutting-edge laboratory equipment into automated workflows, notably specialized analytical equipment, which is designed for human usage. Here we present AutonoMS, a platform for automatically running, processing, and analyzing high-throughput mass spectrometry experiments. AutonoMS is currently written around an ion mobility mass spectrometry (IM-MS) platform and can be adapted to additional analytical instruments and data processing flows. AutonoMS enables automated software agent-controlled end-to-end measurement and analysis runs from experimental specification files that can be produced by human users or upstream software processes. We demonstrate the use and abilities of AutonoMS in a high-throughput flow-injection ion mobility configuration with 5 s sample analysis time, processing robotically prepared chemical standards and cultured yeast samples in targeted and untargeted metabolomics applications. The platform exhibited consistency, reliability, and ease of use while eliminating the need for human intervention in the process of sample injection, data processing, and analysis. The platform paves the way toward a more fully automated mass spectrometry analysis and ultimately closed-loop laboratory workflows involving automated experimentation and analysis coupled to AI-driven experimentation utilizing cutting-edge analytical instrumentation. AutonoMS documentation is available at https://autonoms.readthedocs.io.


Asunto(s)
Metabolómica , Programas Informáticos , Humanos , Reproducibilidad de los Resultados , Espectrometría de Masas , Automatización
3.
Bioinformatics ; 40(2)2024 02 01.
Artículo en Inglés | MEDLINE | ID: mdl-38273672

RESUMEN

MOTIVATION: Proteomic profiles reflect the functional readout of the physiological state of an organism. An increased understanding of what controls and defines protein abundances is of high scientific interest. Saccharomyces cerevisiae is a well-studied model organism, and there is a large amount of structured knowledge on yeast systems biology in databases such as the Saccharomyces Genome Database, and highly curated genome-scale metabolic models like Yeast8. These datasets, the result of decades of experiments, are abundant in information, and adhere to semantically meaningful ontologies. RESULTS: By representing this knowledge in an expressive Datalog database we generated data descriptors using relational learning that, when combined with supervised machine learning, enables us to predict protein abundances in an explainable manner. We learnt predictive relationships between protein abundances, function and phenotype; such as α-amino acid accumulations and deviations in chronological lifespan. We further demonstrate the power of this methodology on the proteins His4 and Ilv2, connecting qualitative biological concepts to quantified abundances. AVAILABILITY AND IMPLEMENTATION: All data and processing scripts are available at the following Github repository: https://github.com/DanielBrunnsaker/ProtPredict.


Asunto(s)
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Proteómica , Proteínas de Saccharomyces cerevisiae/genética , Biología de Sistemas/métodos , Fenotipo
4.
Bioinform Adv ; 3(1): vbad102, 2023.
Artículo en Inglés | MEDLINE | ID: mdl-37600845

RESUMEN

Summary: Artificial intelligence (AI)-driven laboratory automation-combining robotic labware and autonomous software agents-is a powerful trend in modern biology. We developed Genesis-DB, a database system designed to support AI-driven autonomous laboratories by providing software agents access to large quantities of structured domain information. In addition, we present a new ontology for modeling data and metadata from autonomously performed yeast microchemostat cultivations in the framework of the Genesis robot scientist system. We show an example of how Genesis-DB enables the research life cycle by modeling yeast gene regulation, guiding future hypotheses generation and design of experiments. Genesis-DB supports AI-driven discovery through automated reasoning and its design is portable, generic, and easily extensible to other AI-driven molecular biology laboratory data and beyond. Availability and implementation: Genesis-DB code and installation instructions are available at the GitHub repository https://github.com/TW-Genesis/genesis-database-system.git. The database use case demo code and data are also available through GitHub (https://github.com/TW-Genesis/genesis-database-demo.git). The ontology can be downloaded here: https://github.com/TW-Genesis/genesis-ontology/releases/download/v0.0.23/genesis.owl. The ontology term descriptions (including mappings to existing ontologies) and maintenance standard operating procedures can be found at: https://github.com/TW-Genesis/genesis-ontology.

5.
Bioinformatics ; 39(8)2023 08 01.
Artículo en Inglés | MEDLINE | ID: mdl-37572302

RESUMEN

MOTIVATION: Molecular docking is a commonly used approach for estimating binding conformations and their resultant binding affinities. Machine learning has been successfully deployed to enhance such affinity estimations. Many methods of varying complexity have been developed making use of some or all the spatial and categorical information available in these structures. The evaluation of such methods has mainly been carried out using datasets from PDBbind. Particularly the Comparative Assessment of Scoring Functions (CASF) 2007, 2013, and 2016 datasets with dedicated test sets. This work demonstrates that only a small number of simple descriptors is necessary to efficiently estimate binding affinity for these complexes without the need to know the exact binding conformation of a ligand. RESULTS: The developed approach of using a small number of ligand and protein descriptors in conjunction with gradient boosting trees demonstrates high performance on the CASF datasets. This includes the commonly used benchmark CASF2016 where it appears to perform better than any other approach. This methodology is also useful for datasets where the spatial relationship between the ligand and protein is unknown as demonstrated using a large ChEMBL-derived dataset. AVAILABILITY AND IMPLEMENTATION: Code and data uploaded to https://github.com/abbiAR/PLBAffinity.


Asunto(s)
Aprendizaje Automático , Proteínas , Simulación del Acoplamiento Molecular , Ligandos , Unión Proteica , Proteínas/química
6.
NPJ Syst Biol Appl ; 9(1): 11, 2023 04 07.
Artículo en Inglés | MEDLINE | ID: mdl-37029131

RESUMEN

Saccharomyces cerevisiae is a very well studied organism, yet ∼20% of its proteins remain poorly characterized. Moreover, recent studies seem to indicate that the pace of functional discovery is slow. Previous work has implied that the most probable path forward is via not only automation but fully autonomous systems in which active learning is applied to guide high-throughput experimentation. Development of tools and methods for these types of systems is of paramount importance. In this study we use constrained dynamical flux balance analysis (dFBA) to select ten regulatory deletant strains that are likely to have previously unexplored connections to the diauxic shift. We then analyzed these deletant strains using untargeted metabolomics, generating profiles which were then subsequently investigated to better understand the consequences of the gene deletions in the metabolic reconfiguration of the diauxic shift. We show that metabolic profiles can be utilised to not only gaining insight into cellular transformations such as the diauxic shift, but also on regulatory roles and biological consequences of regulatory gene deletion. We also conclude that untargeted metabolomics is a useful tool for guidance in high-throughput model improvement, and is a fast, sensitive and informative approach appropriate for future large-scale functional analyses of genes. Moreover, it is well-suited for automated approaches due to relative simplicity of processing and the potential to make massively high-throughput.


Asunto(s)
Proteínas de Saccharomyces cerevisiae , Saccharomyces cerevisiae , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo , Proteínas de Saccharomyces cerevisiae/genética , Proteínas de Saccharomyces cerevisiae/metabolismo , Metabolómica/métodos
7.
BMC Bioinformatics ; 23(1): 323, 2022 Aug 06.
Artículo en Inglés | MEDLINE | ID: mdl-35933367

RESUMEN

BACKGROUND: A key problem in bioinformatics is that of predicting gene expression levels. There are two broad approaches: use of mechanistic models that aim to directly simulate the underlying biology, and use of machine learning (ML) to empirically predict expression levels from descriptors of the experiments. There are advantages and disadvantages to both approaches: mechanistic models more directly reflect the underlying biological causation, but do not directly utilize the available empirical data; while ML methods do not fully utilize existing biological knowledge. RESULTS: Here, we investigate overcoming these disadvantages by integrating mechanistic cell signalling models with ML. Our approach to integration is to augment ML with similarity features (attributes) computed from cell signalling models. Seven sets of different similarity feature were generated using graph theory. Each set of features was in turn used to learn multi-target regression models. All the features have significantly improved accuracy over the baseline model - without the similarity features. Finally, the seven multi-target regression models were stacked together to form an overall prediction model that was significantly better than the baseline on 95% of genes on an independent test set. The similarity features enable this stacking model to provide interpretable knowledge about cancer, e.g. the role of ERBB3 in the MCF7 breast cancer cell line. CONCLUSION: Integrating mechanistic models as graphs helps to both improve the predictive results of machine learning models, and to provide biological knowledge about genes that can help in building state-of-the-art mechanistic models.


Asunto(s)
Aprendizaje Automático , Neoplasias , Biología Computacional/métodos , Expresión Génica , Humanos
8.
R Soc Open Sci ; 9(5): 211745, 2022 May.
Artículo en Inglés | MEDLINE | ID: mdl-35573039

RESUMEN

The representation of the protein-ligand complexes used in building machine learning models play an important role in the accuracy of binding affinity prediction. The Extended Connectivity Interaction Features (ECIF) is one such representation. We report that (i) including the discretized distances between protein-ligand atom pairs in the ECIF scheme improves predictive accuracy, and (ii) in an evaluation using gradient boosted trees, we found that the resampling method used in selecting the best hyperparameters has a strong effect on predictive performance, especially for benchmarking purposes.

9.
J R Soc Interface ; 19(189): 20210821, 2022 04.
Artículo en Inglés | MEDLINE | ID: mdl-35382578

RESUMEN

Scientific results should not just be 'repeatable' (replicable in the same laboratory under identical conditions), but also 'reproducible' (replicable in other laboratories under similar conditions). Results should also, if possible, be 'robust' (replicable under a wide range of conditions). The reproducibility and robustness of only a small fraction of published biomedical results has been tested; furthermore, when reproducibility is tested, it is often not found. This situation is termed 'the reproducibility crisis', and it is one the most important issues facing biomedicine. This crisis would be solved if it were possible to automate reproducibility testing. Here, we describe the semi-automated testing for reproducibility and robustness of simple statements (propositions) about cancer cell biology automatically extracted from the literature. From 12 260 papers, we automatically extracted statements predicted to describe experimental results regarding a change of gene expression in response to drug treatment in breast cancer, from these we selected 74 statements of high biomedical interest. To test the reproducibility of these statements, two different teams used the laboratory automation system Eve and two breast cancer cell lines (MCF7 and MDA-MB-231). Statistically significant evidence for repeatability was found for 43 statements, and significant evidence for reproducibility/robustness in 22 statements. In two cases, the automation made serendipitous discoveries. The reproduced/robust knowledge provides significant insight into cancer. We conclude that semi-automated reproducibility testing is currently achievable, that it could be scaled up to generate a substantive source of reliable knowledge and that automation has the potential to mitigate the reproducibility crisis.


Asunto(s)
Neoplasias de la Mama , Robótica , Automatización , Biología , Femenino , Humanos , Reproducibilidad de los Resultados
10.
mSystems ; 6(6): e0108721, 2021 Dec 21.
Artículo en Inglés | MEDLINE | ID: mdl-34812651

RESUMEN

The ongoing COVID-19 pandemic urges searches for antiviral agents that can block infection or ameliorate its symptoms. Using dissimilar search strategies for new antivirals will improve our overall chances of finding effective treatments. Here, we have established an experimental platform for screening of small molecule inhibitors of the SARS-CoV-2 main protease in Saccharomyces cerevisiae cells, genetically engineered to enhance cellular uptake of small molecules in the environment. The system consists of a fusion of the Escherichia coli toxin MazF and its antitoxin MazE, with insertion of a protease cleavage site in the linker peptide connecting the MazE and MazF moieties. Expression of the viral protease confers cleavage of the MazEF fusion, releasing the MazF toxin from its antitoxin, resulting in growth inhibition. In the presence of a small molecule inhibiting the protease, cleavage is blocked and the MazF toxin remains inhibited, promoting growth. The system thus allows positive selection for inhibitors. The engineered yeast strain is tagged with a fluorescent marker protein, allowing precise monitoring of its growth in the presence or absence of inhibitor. We detect an established main protease inhibitor by a robust growth increase, discernible down to 1 µM. The system is suitable for robotized large-scale screens. It allows in vivo evaluation of drug candidates and is rapidly adaptable for new variants of the protease with deviant site specificities. IMPORTANCE The COVID-19 pandemic may continue for several years before vaccination campaigns can put an end to it globally. Thus, the need for discovery of new antiviral drug candidates will remain. We have engineered a system in yeast cells for the detection of small molecule inhibitors of one attractive drug target of SARS-CoV-2, its main protease, which is required for viral replication. The ability to detect inhibitors in live cells brings the advantage that only compounds capable of entering the cell and remain stable there will score in the system. Moreover, because of its design in yeast cells, the system is rapidly adaptable for tuning the detection level and eventual modification of the protease cleavage site in the case of future mutant variants of the SARS-CoV-2 main protease or even for other proteases.

11.
Proc Natl Acad Sci U S A ; 118(49)2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34845013

RESUMEN

Almost all machine learning (ML) is based on representing examples using intrinsic features. When there are multiple related ML problems (tasks), it is possible to transform these features into extrinsic features by first training ML models on other tasks and letting them each make predictions for each example of the new task, yielding a novel representation. We call this transformational ML (TML). TML is very closely related to, and synergistic with, transfer learning, multitask learning, and stacking. TML is applicable to improving any nonlinear ML method. We tested TML using the most important classes of nonlinear ML: random forests, gradient boosting machines, support vector machines, k-nearest neighbors, and neural networks. To ensure the generality and robustness of the evaluation, we utilized thousands of ML problems from three scientific domains: drug design, predicting gene expression, and ML algorithm selection. We found that TML significantly improved the predictive performance of all the ML methods in all the domains (4 to 50% average improvements) and that TML features generally outperformed intrinsic features. Use of TML also enhances scientific understanding through explainable ML. In drug design, we found that TML provided insight into drug target specificity, the relationships between drugs, and the relationships between target proteins. TML leads to an ecosystem-based approach to ML, where new tasks, examples, predictions, and so on synergistically interact to improve performance. To contribute to this ecosystem, all our data, code, and our ∼50,000 ML models have been fully annotated with metadata, linked, and openly published using Findability, Accessibility, Interoperability, and Reusability principles (∼100 Gbytes).

12.
Mach Learn ; 109(2): 251-277, 2020.
Artículo en Inglés | MEDLINE | ID: mdl-32174648

RESUMEN

In phenotype prediction the physical characteristics of an organism are predicted from knowledge of its genotype and environment. Such studies, often called genome-wide association studies, are of the highest societal importance, as they are of central importance to medicine, crop-breeding, etc. We investigated three phenotype prediction problems: one simple and clean (yeast), and the other two complex and real-world (rice and wheat). We compared standard machine learning methods; elastic net, ridge regression, lasso regression, random forest, gradient boosting machines (GBM), and support vector machines (SVM), with two state-of-the-art classical statistical genetics methods; genomic BLUP and a two-step sequential method based on linear regression. Additionally, using the clean yeast data, we investigated how performance varied with the complexity of the biological mechanism, the amount of observational noise, the number of examples, the amount of missing data, and the use of different data representations. We found that for almost all the phenotypes considered, standard machine learning methods outperformed the methods from classical statistical genetics. On the yeast problem, the most successful method was GBM, followed by lasso regression, and the two statistical genetics methods; with greater mechanistic complexity GBM was best, while in simpler cases lasso was superior. In the wheat and rice studies the best two methods were SVM and BLUP. The most robust method in the presence of noise, missing data, etc. was random forests. The classical statistical genetics method of genomic BLUP was found to perform well on problems where there was population structure. This suggests that standard machine learning methods need to be refined to include population structure information when this is present. We conclude that the application of machine learning methods to phenotype prediction problems holds great promise, but that determining which methods is likely to perform well on any given problem is elusive and non-trivial.

13.
Proc Natl Acad Sci U S A ; 116(36): 18142-18147, 2019 09 03.
Artículo en Inglés | MEDLINE | ID: mdl-31420515

RESUMEN

One of the most challenging tasks in modern science is the development of systems biology models: Existing models are often very complex but generally have low predictive performance. The construction of high-fidelity models will require hundreds/thousands of cycles of model improvement, yet few current systems biology research studies complete even a single cycle. We combined multiple software tools with integrated laboratory robotics to execute three cycles of model improvement of the prototypical eukaryotic cellular transformation, the yeast (Saccharomyces cerevisiae) diauxic shift. In the first cycle, a model outperforming the best previous diauxic shift model was developed using bioinformatic and systems biology tools. In the second cycle, the model was further improved using automatically planned experiments. In the third cycle, hypothesis-led experiments improved the model to a greater extent than achieved using high-throughput experiments. All of the experiments were formalized and communicated to a cloud laboratory automation system (Eve) for automatic execution, and the results stored on the semantic web for reuse. The final model adds a substantial amount of knowledge about the yeast diauxic shift: 92 genes (+45%), and 1,048 interactions (+147%). This knowledge is also relevant to understanding cancer, the immune system, and aging. We conclude that systems biology software tools can be combined and integrated with laboratory robots in closed-loop cycles.


Asunto(s)
Biología Computacional , Regulación Fúngica de la Expresión Génica , Robótica , Saccharomyces cerevisiae , Programas Informáticos , Biología de Sistemas , Saccharomyces cerevisiae/genética , Saccharomyces cerevisiae/metabolismo
14.
J Cheminform ; 11(1): 68, 2019 Nov 12.
Artículo en Inglés | MEDLINE | ID: mdl-33430958

RESUMEN

The goal of quantitative structure activity relationship (QSAR) learning is to learn a function that, given the structure of a small molecule (a potential drug), outputs the predicted activity of the compound. We employed multi-task learning (MTL) to exploit commonalities in drug targets and assays. We used datasets containing curated records about the activity of specific compounds on drug targets provided by ChEMBL. Totally, 1091 assays have been analysed. As a baseline, a single task learning approach that trains random forest to predict drug activity for each drug target individually was considered. We then carried out feature-based and instance-based MTL to predict drug activities. We introduced a natural metric of evolutionary distance between drug targets as a measure of tasks relatedness. Instance-based MTL significantly outperformed both, feature-based MTL and the base learner, on 741 drug targets out of 1091. Feature-based MTL won on 179 occasions and the base learner performed best on 171 drug targets. We conclude that MTL QSAR is improved by incorporating the evolutionary distance between targets. These results indicate that QSAR learning can be performed effectively, even if little data is available for specific drug targets, by leveraging what is known about similar drug targets.

16.
Sci Rep ; 8(1): 1038, 2018 01 18.
Artículo en Inglés | MEDLINE | ID: mdl-29348637

RESUMEN

Malaria, caused by parasites of the genus Plasmodium, leads to over half a million deaths per year, 90% of which are caused by Plasmodium falciparum. P. vivax usually causes milder forms of malaria; however, P. vivax can remain dormant in the livers of infected patients for weeks or years before re-emerging in a new bout of the disease. The only drugs available that target all stages of the parasite can lead to severe side effects in patients with glucose-6-phosphate dehydrogenase (G6PD) deficiency; hence, there is an urgent need to develop new drugs active against blood and liver stages of the parasite. Different groups have demonstrated that triclosan, a common antibacterial agent, targets the Plasmodium liver enzyme enoyl reductase. Here, we provide 4 independent lines of evidence demonstrating that triclosan specifically targets both wild-type and pyrimethamine-resistant P. falciparum and P. vivax dihydrofolate reductases, classic targets for the blood stage of the parasite. This makes triclosan an exciting candidate for further development as a dual specificity antimalarial, which could target both liver and blood stages of the parasite.


Asunto(s)
Antimaláricos/farmacología , Antagonistas del Ácido Fólico/farmacología , Plasmodium/efectos de los fármacos , Plasmodium/enzimología , Tetrahidrofolato Deshidrogenasa/metabolismo , Triclosán/farmacología , Antimaláricos/química , Sitios de Unión , Activación Enzimática/efectos de los fármacos , Antagonistas del Ácido Fólico/química , Modelos Moleculares , Conformación Molecular , Unión Proteica , Relación Estructura-Actividad , Tetrahidrofolato Deshidrogenasa/química , Triclosán/química
17.
Mach Learn ; 107(1): 285-311, 2018.
Artículo en Inglés | MEDLINE | ID: mdl-31997851

RESUMEN

We investigate the learning of quantitative structure activity relationships (QSARs) as a case-study of meta-learning. This application area is of the highest societal importance, as it is a key step in the development of new medicines. The standard QSAR learning problem is: given a target (usually a protein) and a set of chemical compounds (small molecules) with associated bioactivities (e.g. inhibition of the target), learn a predictive mapping from molecular representation to activity. Although almost every type of machine learning method has been applied to QSAR learning there is no agreed single best way of learning QSARs, and therefore the problem area is well-suited to meta-learning. We first carried out the most comprehensive ever comparison of machine learning methods for QSAR learning: 18 regression methods, 3 molecular representations, applied to more than 2700 QSAR problems. (These results have been made publicly available on OpenML and represent a valuable resource for testing novel meta-learning methods.) We then investigated the utility of algorithm selection for QSAR problems. We found that this meta-learning approach outperformed the best individual QSAR learning method (random forests using a molecular fingerprint representation) by up to 13%, on average. We conclude that meta-learning outperforms base-learning methods for QSAR learning, and as this investigation is one of the most extensive ever comparisons of base and meta-learning methods ever made, it provides evidence for the general effectiveness of meta-learning over base-learning.

18.
J R Soc Interface ; 14(128)2017 03.
Artículo en Inglés | MEDLINE | ID: mdl-28250099

RESUMEN

The theory of computer science is based around universal Turing machines (UTMs): abstract machines able to execute all possible algorithms. Modern digital computers are physical embodiments of classical UTMs. For the most important class of problem in computer science, non-deterministic polynomial complete problems, non-deterministic UTMs (NUTMs) are theoretically exponentially faster than both classical UTMs and quantum mechanical UTMs (QUTMs). However, no attempt has previously been made to build an NUTM, and their construction has been regarded as impossible. Here, we demonstrate the first physical design of an NUTM. This design is based on Thue string rewriting systems, and thereby avoids the limitations of most previous DNA computing schemes: all the computation is local (simple edits to strings) so there is no need for communication, and there is no need to order operations. The design exploits DNA's ability to replicate to execute an exponential number of computational paths in P time. Each Thue rewriting step is embodied in a DNA edit implemented using a novel combination of polymerase chain reactions and site-directed mutagenesis. We demonstrate that the design works using both computational modelling and in vitro molecular biology experimentation: the design is thermodynamically favourable, microprogramming can be used to encode arbitrary Thue rules, all classes of Thue rule can be implemented, and non-deterministic rule implementation. In an NUTM, the resource limitation is space, which contrasts with classical UTMs and QUTMs where it is time. This fundamental difference enables an NUTM to trade space for time, which is significant for both theoretical computer science and physics. It is also of practical importance, for to quote Richard Feynman 'there's plenty of room at the bottom'. This means that a desktop DNA NUTM could potentially utilize more processors than all the electronic computers in the world combined, and thereby outperform the world's current fastest supercomputer, while consuming a tiny fraction of its energy.


Asunto(s)
Algoritmos , Computadores Moleculares , Modelos Teóricos
19.
Front Plant Sci ; 7: 133, 2016.
Artículo en Inglés | MEDLINE | ID: mdl-26904088

RESUMEN

Perennial ryegrass (Lolium perenne L.) is one of the most widely grown forage grasses in temperate agriculture. In order to maintain and increase its usage as forage in livestock agriculture, there is a continued need for improvement in biomass yield, quality, disease resistance, and seed yield. Genetic gain for traits such as biomass yield has been relatively modest. This has been attributed to its long breeding cycle, and the necessity to use population based breeding methods. Thanks to recent advances in genotyping techniques there is increasing interest in genomic selection from which genomically estimated breeding values are derived. In this paper we compare the classical RRBLUP model with state-of-the-art machine learning techniques that should yield themselves easily to use in GS and demonstrate their application to predicting quantitative traits in a breeding population of L. perenne. Prediction accuracies varied from 0 to 0.59 depending on trait, prediction model and composition of the training population. The BLUP model produced the highest prediction accuracies for most traits and training populations. Forage quality traits had the highest accuracies compared to yield related traits. There appeared to be no clear pattern to the effect of the training population composition on the prediction accuracies. The heritability of the forage quality traits was generally higher than for the yield related traits, and could partly explain the difference in accuracy. Some population structure was evident in the breeding populations, and probably contributed to the varying effects of training population on the predictions. The average linkage disequilibrium between adjacent markers ranged from 0.121 to 0.215. Higher marker density and larger training population closely related with the test population are likely to improve the prediction accuracy.

20.
PLoS One ; 10(12): e0142494, 2015.
Artículo en Inglés | MEDLINE | ID: mdl-26630677

RESUMEN

Many advances in synthetic biology require the removal of a large number of genomic elements from a genome. Most existing deletion methods leave behind markers, and as there are a limited number of markers, such methods can only be applied a fixed number of times. Deletion methods that recycle markers generally are either imprecise (remove untargeted sequences), or leave scar sequences which can cause genome instability and rearrangements. No existing marker recycling method is automation-friendly. We have developed a novel openly available deletion tool that consists of: 1) a method for deleting genomic elements that can be repeatedly used without limit, is precise, scar-free, and suitable for automation; and 2) software to design the method's primers. Our tool is sequence agnostic and could be used to delete large numbers of coding sequences, promoter regions, transcription factor binding sites, terminators, etc in a single genome. We have validated our tool on the deletion of non-essential open reading frames (ORFs) from S. cerevisiae. The tool is applicable to arbitrary genomes, and we provide primer sequences for the deletion of: 90% of the ORFs from the S. cerevisiae genome, 88% of the ORFs from S. pombe genome, and 85% of the ORFs from the L. lactis genome.


Asunto(s)
Cicatriz/genética , Biología Computacional/métodos , Genoma Fúngico , Sistemas de Lectura Abierta/genética , Saccharomyces cerevisiae/genética , Schizosaccharomyces/genética , Eliminación de Secuencia , Automatización , Genómica/métodos , Reacción en Cadena de la Polimerasa , Programas Informáticos
SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...